Search CORE

19 research outputs found

Morrigan: A Composite Instruction TLB Prefetcher

Author: Alvarez Lluc
Casas Marc
Grot Boris
Jiménez Daniel A.
Vavouliotis Georgios
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

The effort to reduce address translation overheads has typically targeted data accesses since they constitute the overwhelming portion of the second-level TLB (STLB) misses in desktop and HPC applications. The address translation cost of instruction accesses has been relatively neglected due to historically small instruction footprints. However, state-of-the-art datacenter and server applications feature massive instruction footprints owing to deep software stacks, resulting in high STLB miss rates for instruction accesses. This paper demonstrates that instruction address translation is a performance bottleneck in server workloads. In response, we propose Morrigan, a microarchitectural instruction STLB prefetcher whose design is based on new insights regarding instruction STLB misses. At the core of Morrigan there is an ensemble of table-based Markov prefetchers that build and store variable length Markov chains out of the instruction STLB miss stream. Morrigan further employs a sequential prefetcher and a scheme that exploits page table locality to maximize miss coverage. An important contribution of the work is showing that access frequency is more important than access recency when choosing replacement candidates. Based on this insight, Morrigan introduces a new replacement policy that identifies victims in the Markov prefetchers using a frequency stack while adapting to phase-change behavior. On a set of 45 industrial server workloads, Morrigan eliminates 69% of the memory references in demand page walks triggered by instruction STLB misses and improves geometric mean performance by 7.6%.This work is partially supported by the Spanish Ministry of Science and Technology through the PID2019-107255GB project, the Generalitat de Catalunya (contract 2017-SGR-1414), the NSF grant CCF-1912617, the Semiconductor Research Corporation grant 2936.001, and generous gifts from Intel Labs. Georgios Vavouliotis has been supported by the Spanish Ministry of Economy, Industry and Competitiveness and the European Social Fund under the FPI fellowship No. PRE2018-087046. Marc Casas has been supported by the Spanish Ministry of Economy, Industry and Competitiveness under the Ramon y Cajal fellowship No. RYC-2017-23269.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Edinburgh Research Explorer

Morrigan: A composite instruction TLB prefetcher

Author: Alvarez Martí Lluc
Casas Guix Marc
Grot Boris
Jiménez Daniel A.
Vavouliotis Georgios
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

UPCommons. Portal del coneixement obert de la UPC

Peachy Parallel Assignments (EduHPC 2018)

Author: Alvarez Lluc
Ayaguadé Eduard
Banchelli Fabio
Bunde David P.
Burtscher Martin
González Escribano Arturo
Gutierrez Julian
Joiner David A.
Kaeli David
Previlon Fritz
Rodríguez Gutiez Eduardo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Peachy Parallel Assignments are a resource for instructors teaching parallel and distributed programming. These are high-quality assignments, previously tested in class, that are readily adoptable. This collection of assignments includes implementing a subset of OpenMP using pthreads, creating an animated fractal, image processing using histogram equalization, simulating a storm of high-energy particles, and solving the wave equation in a variety of settings. All of these come with sample assignment sheets and the necessary starter code.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Facilitar la inclusión de ejercicios prácticos de programación paralela en cursos de Computación Paralela o de alto rendimiento (HPC)Comunicación en congreso: Descripción de ejercicios prácticos con acceso a material ya desarrollado y probado

Crossref

UPCommons. Portal del coneixement obert de la UPC

Repositorio Documental de la Universidad de Valladolid

Kean Digital Learning Commons

Adaptive power shifting for power-constrained heterogeneous systems

Author: Alvarez Martí Lluc
Bertran Monfort Ramon
Bose Pradip
Buyuktosunoglu Alper
Moreto Planas Miquel
Ortega Carrasco Cristobal
Rosedahl Todd Jon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/05/2022
Field of study

The number and heterogeneity of compute devices, even within a single compute node, has been steadily on the rise. Since all systems must operate under a power cap, the number of discrete devices that can run simultaneously at their highest frequency is limited by the globally-imposed power cap. Current systems incorporate a centralized power management unit that statically controls the distribution of power among the devices within the node. However, such static distribution policies are unaware of the dynamic utilization profile across the devices, which leads to unfair power allocations that end up degrading system throughput performance. The problem is particularly acute in the presence of heterogeneity since type-specific performance-boost capabilities cannot be leveraged via utilization-agnostic static power allocations. This paper proposes Adaptive Power Shifting for multi-accelerator heterogeneous systems (APS), a technique that leverages system utilization information to dynamically allocate and re-distribute power budgets across multiple discrete devices. Democratizing the power allocation based on dynamic needs results in dramatic speedup over a need-agnostic static allocation. We use APS in a real OpenPOWER compute node with 2 CPUs and 4 GPUs to demonstrate the value of on-demand, equitable power allocations. Overall, the proposed solution increases performance with respect to two state-of-the-art techniques by up to 14.9% and 13.8%.This work has been partially supported by the European Union’s Horizon 2020 research and innovation program under the Mont-Blanc 2020 project (grant agreement 779877), by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB-C22), by Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272) and by the IBM/BSC Deep Learning Center initiative. Ll. Alvarez has been supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under the Juan de la Cierva Formacion fellowship No. FJCI-2016- 30984. M. Moreto has been supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship No. RYC-2016-21104.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

OpenCL-based FPGA accelerator for semi-global approximate string matching using diagonal bit-vectors

Author: Aguado Puig Quim
Alvarez Martí Lluc
Castells Rufas David
Espinosa Morales Antonio
Marco-Sola Santiago
Moreto Planas Miquel
Moure López Juan Carlos
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

An FPGA accelerator for the computation of the semi-global Levenshtein distance between a pattern and a reference text is presented. The accelerator provides an important benefit to reduce the execution time of read-mappers used in short-read genomic sequencing. Previous attempts to solve the same problem in FPGA use the Myers algorithm following a column approach to compute the dynamic programming table. We use an approach based on diagonals that allows for some resource savings while maintaining a very high throughput of 1 alignment per clock cycle. The design is implemented in OpenCL and tested on two FPGA accelerators. The maximum performance obtained is 91.5 MPairs/s for 100 × 120 sequences and 47 MPairs/s for 300 × 360 sequences, the highest ever reported for this problem.This research was supported by the EU Regional Development Fund under the DRAC project [001-P-001723], by the MINECO-Spain (contract TIN2017-84553-C2-1-R), by the MICIU-Spain (contract RTI2018-095209-B-C22) and by the Catalan government (contracts 2017-SGR-1624, 2017-SGR313, 2017-SGR-1328). M.M. was partially supported by the MINECO under RYC-2016-21104. We thank Intel for granting us access to the DevCloud system and let us join the HARP research program. The presented HARP-2 results were obtained on resources hosted at the Paderborn Center for Parallel Computing (PC2) in the Intel Hardware Accelerator Research Program (HARP2).Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Observation of second sound in a rapidly varying temperature field in Ge

Author: Alonso Maria Isabel
Alvarez F. X.
Bafaluy Javier
Beardo Albert
Camacho Juan
Colombo Luciano
López-Suárez Miquel
Melis Claudio
Pérez Luis Alberto
Reparaz Juan Sebastián
Rurali Riccardo
Sendra Lluc
Publication venue
Publication date: 01/01/2021
Field of study

Second sound is known as the thermal transport regime where heat is carried by temperature waves. Its experimental observation was previously restricted to a small number of materials, usually in rather narrow temperature windows. We show that it is possible to overcome these limitations by driving the system with a rapidly varying temperature field. This effect is demonstrated in bulk Ge between 7 kelvin and room temperature, studying the phase lag of the thermal response under a harmonic high frequency external thermal excitation, addressing the relaxation time and the propagation velocity of the heat waves. These results provide a new route to investigate the potential of wave-like heat transport in almost any material, opening opportunities to control heat through its oscillatory nature.Comment: After careful revision we have ruled out the presence of coherent noise and from any other noise source within the reported data. We have updated the manuscript providing a detailed analysis of the photoreflectance signal, demonstrating with experiments its thermal origi

arXiv.org e-Print Archive

PubMed Central

Digital.CSIC

Diposit Digital de Documents de la UAB

The DeepHealth Toolkit: A Unified Framework to Boost Biomedical Applications

Author: Allegretti Stefano
Alvarez Lluc
Ander G\uf3mez Jon
Badouh Asaf
Bolelli Federico
Canalini Laura
Cancilla Michele
Carri\uf3n Salvador
Enrico Piras Marco
Grana Costantino
Leo Simone
Marco-Sola Santiago
Moreto Miquel
Paredes Palacios Roberto
Pireddu Luca
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Given the overwhelming impact of machine learning on the last decade, several libraries and frameworks have been developed in recent years to simplify the design and training of neural networks, providing array-based programming, automatic differentiation and user-friendly access to hardware accelerators. None of those tools, however, was designed with native and transparent support for Cloud Computing or heterogeneous High-Performance Computing (HPC). The DeepHealth Toolkit is an open source Deep Learning toolkit aimed at boosting productivity of data scientists operating in the medical field by providing a unified framework for the distributed training of neural networks, which is able to leverage hybrid HPC and cloud environments in a transparent way for the user. The toolkit is composed of a Computer Vision library, a Deep Learning library, and a front-end for non-expert users; all of the components are focused on the medical domain, but they are general purpose and can be applied to any other field. In this paper, the principles driving the design of the DeepHealth libraries are described, along with details about the implementation and the interaction between the different elements composing the toolkit. Finally, experiments on common benchmarks prove the efficiency of each separate component and of the DeepHealth Toolkit overall

UPCommons. Portal del coneixement obert de la UPC

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Famílies botàniques de plantes medicinals

Author: Alcoverro Godoy Carmen
Alles Pascual Roser
Alvarez Lorenzo Paula
Anglí Herrero Oriol
Bagan Perez Nuria
Balaguer Pi Nuria
Balcells Mestre Maria
Ballesté Márquez Jéssica
Bartolomé Schneider Carla
Batlle de Balle Mercadé Laura
Belmonte Llorens Judit
Bujedo Moreno Sergi
Caballero Roman Aitor
Calafat Gestoso Mario
Camps Vilar Nuria
Cardús Agra Aida
Casas Serrano Rut
Castanheira Margarida
Castilla i Amorós Laia
Castro de la Cortina Laia
Chavero Pieres Marta
Ciocia Nicola
Collado Lorenzo Jessica
Corominas Auguets Mònica
Costa Santamaria Berta
Dalmases Gener Marc de
Dios Regadera Montserrat de
Domingo Gesteiro Adrián
Dorado Cordero Desirée
Escrigas Albó Helena Evangelista
Farré Carrera Laia
Farré Segura Jordi
Fernández Catalán Miren
Fernández Martínez Gerard
Ferreras Barrero Oriol
Forcen Arenas Meritxell
Formiga Ribas Estel
Fragüet Español Beatriz
Fuertes Flores Lara
Fàbregas Vàzquez Júlia
Garcia i Salvador Nestor
García Marquina Cristian
Garrell Soler Gemma
Garrido Lopez Ainoa
Giménez García del Moral María
Gispert Latorre Laia
Gomez de la Peña Celia
Gomez-Guiu Hormigos Ma. Lourdes
Grau Calzada Victoria
Grau Ortiz Miquel
Grima Arcos Núria
Gómez Fusté Clàudia Maria
Herráez Nieto Silvia
Hurtado Espino Silvia
Iglesias Rodrigo Mireia
Izquierdo Pérez Noelia
Kaichouh Agrirch Mimoun
Labraña Sánchez Carme
Lara Arteaga Maria Dolores
Lasurt Barés Claudia
Llarden Mediavilla Arnau
Llibre Perez Monica
Llibre Perez Sandra
Llorca Lorenzo Sonia
Llorente Lopez Xavier
Luque Castro Adrià
Luque Salvat Marc
López Alonso Javier
López Ruiz Sergio
Madurell i Blanes Laura
Manouchehri Aminian S.
March Rodríguez Elena
Martell Alonso Clàudia
Martinez Alguacil Helena
Martinez Bosch Laia
Martínez Riveros Héctor
Martínez Samitier Àlex
Mas Rincón Irene
Matas Ayala Anaïs
Miranda Jimenez Cristina
Molas Casellas Júlia
Molina Trullàs Júlia
Moner Gomez Sofia
Moral Anter David
Morera Nadal Júlia
Moya Martinez Mari Carmen
Munte Jesus Guillem
Nadal Serrano Maria de Lluc
Narvaez Serrano Daniel Lluis
Navarro Pinin Laura
Nevado Maza Sara
Oliver Sintes Cristóbal
Oros Olondriz Alberto
Ortega Herrero Natalia
Ortega Moreno Angel
Pachón Díaz Carles
Pagans Llivina Silvia
Pau Parra Alba
Perez Prats Marc
Pino Alamos Maria Pilar
Pons Hospital Santiago
Pucuji Tierra Lizeth Estefaní
Puig Puig Júlia
Raventós Aymar Cristina
Redondo Vahle Ana
Reyner Parra Andrés Joaquím
Riba Baqués Marta
Rio Martinez Helena
Rodriguez Isidro Pol
Roig Rossello Mariona
Roig Turner Gemma
Rojas Ligeron John Henry
Ros Peña Alba
Rosendo Masià Cristina
Rubio Petit Núria
Ruiz Avila Genesis
Ruíz Mateo Héctor
Sanchez Perez Cristina
Santomà Cardús Alex
Sanz Peñalver Sara
Soler Mallart Guillem
Tanyà Rovira Anna
Thorson Bofarull Leif
Torres Solera Olga
Torres Vila Maria
Turu Pedrola Marta
Vega Rodríguez Laura
Veiret Duart Gabriel
Vela Pérez Miriam
Ventura Molina Pere
Vidal Bernaltes Marta
Vintu Stefan Silviu
Ye Wenxi
Publication venue
Publication date: 01/09/2014
Field of study

Facultat de Farmàcia, Universitat de Barcelona. Ensenyament: Grau de Farmàcia, Assignatura: Botànica Farmacèutica, Curs: 2013-2014, Coordinadors: Joan Simon, Cèsar Blanché i Maria Bosch.Els materials que aquí es presenten són els recull de 175 treballs d’una família botànica d’interès medicinal realitzats de manera individual. Els treballs han estat realitzat per la totalitat dels estudiants dels grups M-2 i M-3 de l’assignatura Botànica Farmacèutica durant els mesos d’abril i maig del curs 2013-14. Tots els treballs s’han dut a terme a través de la plataforma de GoogleDocs i han estat tutoritzats pel professor de l’assignatura i revisats i finalment co-avaluats entre els propis estudiants. L’objectiu principal de l’activitat ha estat fomentar l’aprenentatge autònom i col·laboratiu en Botànica farmacèutica

Diposit Digital de la Universitat de Barcelona

Characterizing the impact of last-level cache replacement policies on big-data workloads

Author: Alvarez Lluc
Casas Marc
Jamet Alexandre Valentin
Publication venue
Publication date: 11/05/2023
Field of study

In recent years, graph-processing has become an essential class of workloads with applications in a rapidly growing number of fields. Graph-processing typically uses large input sets, often in multi-gigabyte scale, and data-dependent graph traversal methods exhibiting irregular memory access patterns. Recent work demonstrates that, due to the highly irregular memory access patterns of data-dependent graph traversals, state-of-the-art graph-processing workloads spend up to 80 % of the total execution time waiting for memory accesses to be served by the DRAM. The vast disparity between the Last Level Cache (LLC) and main memory latencies is a problem that has been addressed for years in computer architecture. One of the prevailing approaches when it comes to mitigating this performance gap between modern CPUs and DRAM is cache replacement policies. In this work, we characterize the challenges drawn by graph-processing workloads and evaluate the most relevant cache replacement policies.Comment: Extended abstract submitted to the 10th BSC Doctoral Symposiu

arXiv.org e-Print Archive